Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4907: Stop client packets processing after server channel closed #2236

Merged
merged 3 commits into from
Mar 25, 2025

Conversation

jiafu1115
Copy link
Contributor

Refer to ZOOKEEPER-4907

We got the error:

2024-11-07 19:03:01,414 [myid:14] - WARN [nioEventLoopGroup-7-25:NettyServerCnxn@537] - Closing connection to /135.224.186.250:47051
java.io.IOException: Len error 794913900
at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:521)
at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:374)
at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead

It cause us very confused about it whether we write some big data into zookeeper. Thus. After trouble shooting, we found that it is just the log/issue when closing the server caused by reelecting the leader sometimes. In actually. We don't send any big data.

So I think we can do one tiny code change to avoid throw the error which causing confusion to reduce trouble shooting effort.

@jiafu1115
Copy link
Contributor Author

jiafu1115 commented Mar 17, 2025

The root cause for the error throw is caused by follow logic:
when closing the server. the follow code don't have chance to get executed:
bb = ByteBuffer.allocate(len); //line 552
so the remain message (body) will be taken as the head msg again and parsed again as the header.
then body length is not right now. sometimes it over the max size.
So I move the server status check before the length check.

@kezhuw
Copy link
Member

kezhuw commented Mar 23, 2025

The root cause for the error throw is caused by follow logic:
when closing the server. the follow code don't have chance to get executed:
bb = ByteBuffer.allocate(len); //line 552
so the remain message (body) will be taken as the head msg again and parsed again as the header.
then body length is not right now. sometimes it over the max size.
So I move the server status check before the length check.

Seems that there are multiple incoming packets during server close. Since processMessage is called sequential in event loop thread, I would suggest to guard against closingChannel before decoding the packet inside processMessage. That is we should not even reading the packet.

@jiafu1115
Copy link
Contributor Author

@kezhuw thanks for comment. I update the code according to your suggestion. You can help to double check. thanks

@jiafu1115
Copy link
Contributor Author

test fail? no relationshiop with this commit?
08:43:08 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.045 s <<< FAILURE! - in org.apache.zookeeper.server.admin.RestoreQuorumTest
08:43:08 [ERROR] testRestoreAfterQuorumLost Time elapsed: 3.805 s <<< ERROR!
08:43:08 java.net.ConnectException: Connection refused (Connection refused)
08:43:08 at java.net.PlainSocketImpl.socketConnect(Native Method)
08:43:08 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
08:43:08 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
08:43:08 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
08:43:08 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
08:43:08 at java.net.Socket.connect(Socket.java:607)
08:43:08 at java.net.Socket.connect(Socket.java:556)
08:43:08 at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
08:43:08 at org.apache.zookeeper.server.admin.SnapshotAndRestoreCommandTest.takeSnapshotAndValidate(SnapshotAndRestoreCommandTest.java:413)
08:43:08 at org.apache.zookeeper.server.admin.RestoreQuorumTest.testRestoreAfterQuorumLost(RestoreQuorumTest.java:56)

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for handling this.

@kezhuw kezhuw changed the title ZOOKEEPER-4907 Shouldn't throw "Len error" when server closing ZOOKEEPER-4907: Stop client packets processing after server channel closed Mar 25, 2025
@kezhuw kezhuw merged commit 6e4ec27 into apache:master Mar 25, 2025
14 checks passed
kezhuw pushed a commit that referenced this pull request Mar 25, 2025
…losed

Reviewers: kezhuw, tisonkun
Author: jiafu1115
Closes #2236 from jiafu1115/jiafu1115-patch-1

(cherry picked from commit 6e4ec27)
Signed-off-by: Kezhu Wang <[email protected]>
@kezhuw
Copy link
Member

kezhuw commented Mar 25, 2025

@jiafu1115 Thank you for your contribution! Merged to master and backported to branch-3.9.

@jiafu1115
Copy link
Contributor Author

@jiafu1115 Thank you for your contribution! Merged to master and backported to branch-3.9.

Thanks ! It is my pleasure ! @kezhuw @tisonkun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants